Genes involved in inflammatory bowel diseases and use thereof

ABSTRACT

The invention concerns genes involved in inflammatory and/or immune diseases and some cancers, in particular intestinal cryptogenic inflammatory diseases, and proteins coded by said genes. The invention also concerns methods for diagnosing inflammatory diseases.

CROSS-REFERENCES TO RELATED APPLICATIONS

This application is a continuation of U.S. application Ser. No.12/370,543, filed Feb. 12, 2009, which application is a continuation ofU.S. application Ser. No. 10/240,046, filed Jan. 15, 2003, now U.S. Pat.No. 7,592,437 which issued on Sep. 22, 2009, which application was aNational Stage application of PCT/FR01/00935, filed Mar. 27, 2001, whichapplication claims priority to FR 0003832, filed Mar. 27, 2000, all ofwhich are incorporated by reference in their entirety.

The present invention relates to genes involved in inflammatory and/orimmune diseases and certain cancers, in particularly cryptogeneticinflammatory bowel diseases, and also to the proteins encoded by thesegenes. The present invention also relates to methods for diagnosinginflammatory diseases.

Cryptogenetic inflammatory bowel diseases (IBDs) are diseasescharacterized by an inflammation of the digestive tract, the cause ofwhich is unknown. Depending on the location and the characteristics ofthe inflammation, two different nosological entities are distinguished:ulcerative colitis (UC) and Crohn's disease (CD). UC was described by SWilkes in 1865, whereas the first case of regional ileitis was reportedby Crohn in 1932. In reality, it is possible that these two diseases goback much further.

IBDs are chronic diseases which evolve throughout life and which affectapproximately 1 to 2 individuals per 1000 inhabitants in westerncountries, which represents between 60000 and 100000 individualssuffering from these diseases in France. They are diseases which appearin young individuals (peak instance is in the third decade), progressingvia attacks interspersed with remissions, with frequent complicationssuch as undernutrition, retarded growth in children, bonedemineralization and, in the end, malignant degeneration to coloncancer. No specific treatment exists. Conventional therapeutics make useof anti-inflammatories, of immunosuppressors and of surgery. All thesetherapeutic means are, themselves, a source of considerable iatrogenicmorbidity. For all these reasons, IBDs appear to be a considerablepublic health problem.

The etiology of IBDs is currently unknown. Environmental factors areinvolved in the occurrence of the disease, as witnessed by the secularincrease in incidence of the disease and the incomplete concordance inmonozygous twins. The only environmental risk factors currently knownare 1) tobacco, the role of which is harmful in CD and beneficial in UC,and 2) appendectomy which has a protective role for UC.

Genetic predisposition has been suspected for a long time due to theexistence of ethnic and familial aggregation of these diseases. In fact,IBDs are more common in the Caucasian population, and in particular inthe Jewish population of central Europe. Familial forms represent from 6to 20% of IBD cases. They are particularly common when the diseasebegins early. However, it is studies in twins which have made itpossible to confirm the genetic nature of these diseases. In fact, theconcordance rate between twins for these diseases is greater inmonozygous twins than in dizygous twins, which pleads strongly in favorof a hereditary component to IBDs, in particular to CD. In allprobability, IBDs are complex genetic diseases involving severaldifferent genes, interacting with one another and with environmentalfactors. IBDs can therefore be classified within the context ofmultifactor diseases.

Two major strategies have been developed in order to demonstrate theIBD-susceptibility genes. The first is based on the analysis of geneswhich are candidates for physiopathological reasons. Thus, many geneshave been proposed as potentially important for IBDs. They are oftengenes which have a role in inflammation and the immune response. Mentionmay be made of the HLA, TAP, TNF and MICA genes, lymphocyte T receptor,ICAM1, interleukin 1, CCR5, etc. Other genes participate in diversefunctions, such as GAI2, motilin, MRAMP, HMLH1, etc. In reality, none ofthe various candidate genes studied has currently definitively proveditself to have a role in the occurrence of IBDs.

The recent development of human genome maps using highly polymorphicgenetic markers has enabled geneticists to develop a nontargetedapproach over the entire genome. This approach, also called reversegenetics or positional cloning, makes no hypothesis regarding the genesinvolved in the disease and attempts to discover them through systematicscreening of the genome. The method most used for complex geneticdiseases is based on studying identity by decendance of the affectedindividuals of the same family. This value is calculated for a largenumber (300-400) of polymorphism markers distributed evenly (every 10cM)over the genome). In the case of excess identity between affectedindividuals, the marker(s) tested indicate(s) a region supposed tocontain a gene for susceptibility to the disease. In the case of complexgenetic diseases, since the model underlying the genetic predisposition(number of genes and relative importance of each of them) is unknown,the statistical methods to be used will have to be adjusted.

The present invention relates to the demonstration of the nucleic acidsequence of genes involved in IBDs, and other inflammatory diseases, andalso the use of these nucleic acid sequences.

In the context of the present invention, preliminary studies by theinventors have already made it possible to locate a CD-susceptibilitygene. Specifically, the inventors (Hugot et al., 1996) have shown that aCD-susceptibility gene is located in the pericentromeric region ofchromosome 16 (FIG. 1). It was the first gene for susceptibility to acomplex genetic disease located by positional cloning and satisfying thestrict criteria proposed in the literature (Lander and Kruglyak, 1995).This gene was named IBD1 (for inflammatory bowel disease 1). Since then,other locations have been proposed by other authors, in particular onchromosomes 12, 1, 3, 6 and 7 (Satsangi et al., 1996; Cho et al., 1998).Although they have been located, it has currently not been possible toidentify any of these IBD-susceptibility genes.

Some authors have not been able to replicate this location (Rioux etal., 1998). This is not, however, surprising in the case of complexgenetic diseases in which genetic heterogeneity is probable.

It is interesting to note that, according to the same approach ofpositional cloning, locations have also been proposed on chromosome 16for several immune and inflammatory diseases, such as ankylosingspondylarthritis, Blau's syndrome, psoriasis, etc. (Becker et al., 1998;Tromp et al., 1996). All these diseases may then share the same gene (orthe same group of genes) located on chromosome 16.

A maximum of genetic linkage tests is virtually always located at thesame position, in the region of D16S409 or D16S411, separated only by2cM. This result contradicts the considerable size (usually greater than20cM) of the confidence interval which can be attributed to the geneticlocation according to an approach using nonparametric linkage analyses.

Comparison of the statistical tests used in the studies by the inventorsshows that the tests based on complete identity by decendance (Tz2) arebetter than the tests based on the mean of identity by decendance (Tz)(FIG. 1). Such a difference can be explained by a recessive effect ofIBD1.

Several genes known to be in the pericentromeric region of chromosome16, such as the interleukin 4 receptor, CD19, CD43 or CD11, appear to begood potential candidates for CD. Preliminary results do not howeverplead in favor of these genes being involved in CD.

In particular, the present invention provides not only the sequence ofIBD1 gene, but also the partial sequence of another gene, calledIBD1prox due to it being located in proximity to IBD, and demonstratedas reported in the examples below. These genes, the cDNA sequence ofwhich corresponds, respectively, to SEQ ID No. 1 and SEQ ID No. 4, aretherefore potentially involved in many inflammatory and/or immunediseases and also in cancers.

The peptide sequence expressed by the IBD1 and IBD1prox genes isrepresented by SEQ ID No. 2 and SEQ ID No. 5, respectively; the genomicsequence of these genes is represented by SEQ ID No. 3 and SEQ ID No. 6,respectively.

Thus, a subject of the present invention is a purified or isolatednucleic acid, characterized in that it Comprises a nucleic acid sequencechosen from the following group of sequences:

-   a) SEQ ID No. 1, SEQ ID No. 3, SEQ ID No. 4 and SEQ ID No. 6;-   b) the sequence of a fragment of at least 15 consecutive nucleotides    of a sequence chosen from SEQ ID No. 1, SEQ ID No. 3, SEQ ID No. 4    or SEQ ID No. 6;-   c) a nucleic acid sequence having a percentage identity of at least    80%, after optimal alignment, with a sequence defined in a) or b);-   d) a nucleic acid sequence which hybridizes, under high stringency    conditions, with a nucleic acid sequence defined in a) or b);-   e) the complementary sequence or the RNA sequence corresponding to a    sequence as defined in a), b), c) or d).

The nucleic acid sequence according to the invention defined in c) has apercentage identity of at least 80%, after optimal alignment, with asequence as defined in a) or b) above, preferably 90%, most preferably98%.

The terms “nucleic acid”, “nucleic acid sequence”, “polynucleotide”,“oligonucleotide”, “polynucleotide sequence” and “nucleotide sequence”,terms which will be employed indifferently in the present description,are intended to denote a precise series of nucleotides, which may or maynot be modified, making it possible to define a fragment or a region ofa nucleic acid, which may or may not comprise unnatural nucleotides, andwhich may correspond equally to a double-stranded DNA, a single-strandedDNA and transcription products of said DNAs. Thus, the nucleic acidsequences according to the invention also encompass PNAs (PeptideNucleic Acids), or the like.

It should be understood that the present invention does not relate tothe nucleotide sequences in their natural chromosomal environment, thatis to say in the natural state. They are sequences which have beenisolated and/or purified, that is to say they have been taken directlyor indirectly, for example by copying, their environment having been atleast partially modified. Thus, nucleic acids obtained by chemicalsynthesis are also intended to be denoted.

For the purpose of the present invention, the term “percentage identity”between two nucleic acid or amino acid sequences is intended to denote apercentage of nucleotides or of amino acid residues which are identicalbetween the two sequences to be compared, obtained after the bestalignment, this percentage being purely statistical and the differencesbetween the two sequences being distributed randomly and over theirentire length. The term “best alignment” or “optimal alignment” isintended to denote the alignment for which the percentage identitydetermined as below is highest. Sequence comparisons between two nucleicacid or amino acid sequences are conventionally carried out by comparingthese sequences after having aligned them optimally, said comparisonbeing carried out by segment or by “window of comparison” so as toidentify and compare local regions of sequence similarity. The optimalalignment of the sequences for the comparison may be carried out,besides manually, by means of the local homology algorithm of Smith andWaterman (1981), by means of the local homology algorithm of Neddlemanand Wunsch (1970), by means of the similarity search method of Pearsonand Lipman (1988), by means of computer programs using these algorithms(GAP, BESTFIT, BLAST P, BLAST N, FASTA and TFASTA in the WisconsinGenetics Software Package, Genetics Computer Group, 575 Science Dr.,Madison, Wis.). In order to obtain the optimal alignment, the BLASTprogram is preferably used, with the BLOSUM 62 matrix. The PAM or PAM250matrices may also be used.

The percentage identity between two nucleic acid or amino acid sequencesis determined by comparing these two sequences aligned optimally, thenucleic acid or amino acid sequence to be compared possibly comprisingadditions or deletions with respect to the reference sequence foroptimal alignment between these two sequences. The percentage identityis calculated by determining the number of identical positions for whichthe nucleotide or the amino acid residue is identical between the twosequences, dividing this number of identical positions by the totalnumber of positions compared and multiplying the resultant number by 100so as to obtain the percentage identity between these two sequences.

The expression “nucleic acid sequences having a percentage identity ofat least 80%, preferably 90%, more preferably 98%, after optimalalignment with a reference sequence” is intended to denote the nucleicacid sequences which, compared to the reference nucleic acid sequence,have certain modifications, such as in particular a deletion, atruncation, an extension, a chimeric fusion and/or a substitution, inparticular of the point type, and the nucleic acid sequence of whichexhibits at least 80%, preferably 90%, more preferably 98%, identity,after optimal alignment, with the reference nucleic acid sequence. Theyare preferably sequences whose complementary sequences are capable ofhybridizing specifically with the sequence SEQ ID No. 1 or SEQ ID No. 4of the invention. Preferably, the specific or high stringencyhybridization conditions will be such that they ensure at least 80%,preferably 90%, more preferably 98%, identity, after optimal alignment,between one of the two sequences and the sequence complementary to theother.

Hybridization under high stringency conditions means that the conditionsof temperature and of ionic strength are chosen such that they allow thehybridization between two complementary DNA fragments to be maintained.By way of illustration, high stringency conditions for the hybridizationstep for the purposes of defining the polynucleotide fragments describedabove are advantageously as follows.

The DNA-DNA or DNA-RNA hybridization is carried out in two steps: (1)prehybridization at 42° C. for 3 hours in phosphate buffer (20 mM, pH7.5) containing 5×SSC (1×SSC corresponds to a solution of 0.15 MNaCl+0.015 M sodium citrate), 50% of formamide, 7% of sodium dodecylsulfate (SDS), 10×Denhardt's, 5% of dextran sulfate and 1% of salmonsperm DNA; (2) hybridization per se for 20 hours at a temperature whichdepends on the length of the probe (i.e.: 42° C. for a probe >100nucleotides in length), followed by 2 washes of 20 minutes at 20° C. in2×SSC+2% SDS and 1 wash of 20 minutes at 20° C. in 0.1×SSC+0.1% SDS. Thefinal wash is carried out in 0.1×SSC+0.1% SDS for 30 minutes at 60° C.for a probe >100 nucleotides in length. The high stringencyhybridization conditions described above for a polynucleotide of definedlength may be adjusted by those skilled in the art for longer or shorteroligonucleotides, according to the teaching of Sambrook et al., 1989.

Among the nucleic acid sequences having a percentage identity of atleast 80%, preferably 90%, more preferably 98%, after optimal alignment,with the sequence according to the invention, preference is also givento the variant nucleic acid sequences of SEQ ID No. 1 or of SEQ ID No.4, or of fragments thereof, that is to say all the nucleic acidsequences corresponding to allelic variants, that is to say individualvariations of the sequence SEQ ID No. 1 or SEQ ID No. 4. These naturalmutated sequences correspond to polymorphisms present in mammals, inparticular in humans and, in particular, to polymorphisms which may leadto the occurrence of a pathological condition. Preferably, the presentinvention relates to the variant nucleic acid sequences in which themutations lead to a modification of the amino acid sequence of thepolypeptide, or of fragments thereof, encoded by the normal sequence ofSEQ ID No. 1 or SEQ ID No. 4.

The expression “variant nucleic acid sequence” is also intended todenote any RNA or cDNA resulting from a mutation and/or variation of asplice site of the genomic nucleic acid sequence the cDNA of which hasthe sequence SEQ ID No. 1 or SEQ ID No. 4.

The invention preferably relates to a purified or isolated nucleic acidaccording to the present invention, characterized in that it comprisesor consists of one of the sequences SEQ ID No. 1 or SEQ ID No. 4, of thesequences complementary thereto, or of the RNA sequences correspondingto SEQ ID No. 1 or SEQ ID No. 4.

The probes or primers, characterized in that they comprise a sequence ofa nucleic acid according to the invention, are also part of theinvention.

Thus, the present invention also relates to the primers or the probesaccording to the invention which may make it possible in particular todemonstrate or to distinguish the variant nucleic acid sequences, or toidentify the genomic sequence of the genes the cDNA of which isrepresented by SEQ ID No. 1 or SEQ ID No. 4, in particular using anamplification method such as the PCR method or a related method.

The invention also relates to the use of a nucleic acid sequenceaccording to the invention, as a probe or primer, for detecting,identifying, assaying or amplifying a nucleic acid sequence.

According to the invention, the polynucleotides which can be used as aprobe or as a primer in methods for detecting, identifying, assaying oramplifying a nucleic acid sequence are a minimum of 15 bases, preferably20 bases, or better still 25 to 30 bases in length.

The probes and primers according to the invention may be labeleddirectly or indirectly with a radioactive or nonradioactive compoundusing methods well known to those skilled in the art, in order to obtaina detectable and/or quantifiable signal.

The polynucleotide sequences according to the invention which areunlabeled can be used directly as a probe or primer.

The sequences are generally labeled so as to obtain sequences which canbe used in many applications. The primers or the probes according to theinvention are labeled with radioactive elements or with nonradio-activemolecules.

Among the radioactive isotopes used, mention may be made of ³²P, ³³P,³⁵S, ³H or ¹²⁵I. The nonradioactive entities are selected from ligandssuch as biotin, avidin, streptavidin or dioxygenin, haptens, dyes andluminescent agents, such as radioluminescent, chemiluminescent,bioluminescent, fluorescent or phosphorescent agents.

The polynucleotides according to the invention may thus be used as aprimer and/or probe in methods using in particular the PCR (polymerasechain reaction) technique (Rolfs et al., 1991). This technique requireschoosing pairs of oligonucleotide primers bordering the fragment whichmust be amplified. Reference may, for example, be made to the techniquedescribed in U.S. Pat. No. 4,683,202. The amplified fragments can beidentified, for example after agarose or polyacrylamide gelelectrophoresis, or after a chromatographic technique such as gelfiltration or ion exchange chromatography, and then sequenced. Thespecificity of the amplification can be controlled using, as primers,the nucleotide sequences of polynucleotides of the invention and, asmatrices, plasmids containing these sequences or else the derivedamplification products. The amplified nucleotide fragments may be usedas reagents in hybridization reactions in order to demonstrate thepresence, in a biological sample, of a target nucleic acid of sequencecomplementary to that of said amplified nucleotide fragments.

The invention is also directed toward the nucleic acids which can beobtained by amplification using primers according to the invention.

Other techniques for amplifying the target nucleic acid mayadvantageously be employed as an alternative to PCR (PCR-like) using apair of primers of nucleotide sequences according to the invention. Theterm “PCR-like” is intended to denote all the methods using direct orindirect reproductions of nucleic acid sequences, or else in which thelabeling systems have been amplified; these techniques are of course,known. In general, they involve amplifying the DNA with a polymerase;when the sample of origin is an RNA a reverse transcription should becarried out beforehand. A large number of methods currently exist forthis amplification, such as, for example, the SDA (strand displacementamplification) technique (Walker et al., 1992), the TAS(transcription-based amplification system) technique described by Kwohet al. (1989), the 3SR (self-sustained sequence replication) techniquedescribed by Guatelli et al. (1990), the NASBA (nucleic acid sequencebased amplification) technique described by Kievitis et al. (1991), theTMA (transcription mediated amplification) technique, the LCR (ligasechain reaction) technique described by Landegren et al. (1988), the RCR(repair chain reaction) technique described by Segev (1992), the CPR(cycling probe reaction) technique described by Duck et al. (1990), andthe Q-beta-replicase amplification technique described by Miele et al.(1983). Some of these techniques have since been improved.

When the target polynucleotide to be detected is an mRNA, an enzyme ofthe reverse transcriptase type is advantageously used, prior to carryingout an amplification reaction using the primers according to theinvention or to carrying out a method of detection using the probes ofthe invention, in order to obtain a cDNA from the mRNA contained in thebiological sample. The cDNA obtained will then serve as a target for theprimers or the probes used in the amplification or detection methodaccording to the invention.

The probe hybridization technique may be carried out in many ways(Matthews et al., 1988). The most general method consists inimmobilizing the nucleic acid extracted from the cells of varioustissues or from cells in culture, on a support (such as nitrocellulose,nylon or polystyrene), and in incubating the immobilized target nucleicacid with the probe, under well-defined conditions. After hybridization,the excess probe is removed and the hybrid molecules formed are detectedusing the appropriate method (measuring the radioactivity, thefluorescence or the enzymatic activity linked to the probe).

According to another embodiment of the nucleic acid probes according tothe invention, the latter may be used as capture probes. In this case, aprobe, termed “capture probe”, is immobilized on a support and is usedto capture, by specific hybridization, the target nucleic acid obtainedfrom the biological sample to be tested, and the target nucleic acid isthen detected using a second probe, termed “detection probe”, labeledwith a readily detectable element.

Among the advantageous nucleic acid fragments, mention should thus bemade in particular of antisense oligonucleotides, i.e. oligonucleotides,the structure of which ensures, by hybridization with the targetsequence, inhibition of expression of the corresponding product. Mentionshould also be made of sense oligonucleotides, which, by interactingwith proteins involved in regulating the expression of the correspondingproduct, will induce either inhibition or activation of this expression.

In both cases (sense and antisense), the oligo-nucleotides of theinvention may be used in vitro and in vivo.

The present invention also relates to an isolated polypeptide,characterized in that it comprises a polypeptide chosen from:

-   a) a polypeptide of sequence SEQ ID No. 2 or SEQ ID No. 5;-   b) a variant polypeptide of a polypeptide of sequence defined in a);-   c) a polypeptide homologous to a polypeptide defined in a) or b),    comprising at least 80% identity with said polypeptide of a);-   d) a fragment of at least 15 consecutive amino acids of a    polypeptide defined in a), b) or c);-   e) a biologically active fragment of a polypeptide defined in a), b)    or c).

For the purpose of the present invention, the term “polypeptide” isintended to denote proteins or peptides.

The expression “biologically active fragment” is intended to mean afragment having the same biological activity as the peptide fragmentfrom which it is deduced, preferably within the same order of magnitude(to within a factor of 10). Thus, the examples show that the IBD1protein (SEQ ID No. 2) has a potential role in apoptosis phenomena. Abiologically active fragment of the IBD1 protein therefore consists of apolypeptide derived from SEQ ID No. 2, also having a role in apoptosis.The examples below propose biological functions for the IBD1 andIBD1prox proteins, as a function of the peptide domains of theseproteins, and thus allow those skilled in the art to identify thebiologically active fragments.

Preferably, a polypeptide according to the invention is a polypeptideconsisting of the sequence SEQ ID No. 2 (corresponding to the proteinencoded by the IBD1 gene) or of the sequence SEQ ID No. 5 (correspondingto the protein encoded by IBD1prox) or of a sequence having at least 80%identity with SEQ ID No. 2 or SEQ ID No. 5 after optimal alignment.

The sequence of the polypeptide has a percentage identity of at least80%, after optimal alignment, with the sequence SEQ ID No. 2 or SEQ IDNo. 5, preferably 90%, more preferably 98%.

The expression “polypeptide, the amino acid sequence of which has apercentage identity of at least 80%, preferably 90%, more preferably98%, after optimal alignment, with a reference sequence” is intended todenote the polypeptides having certain modifications compared to thereference polypeptide, such as in particular one or more deletionsand/or truncations, an extension, a chimeric fusion and/or one or moresubstitutions.

Among the polypeptides, the amino acid sequence of which has apercentage identity of at least 80%, preferably 90%, more preferably98%, after optimal alignment, with the sequence SEQ ID No. 2 or SEQ IDNo. 5 or with a fragment thereof according to the invention, preferenceis given to the variant polypeptides encoded by the variant nucleic acidsequences as defined previously, in particular the polypeptides, theamino acid sequence of which has at least one mutation corresponding inparticular to a truncation, deletion, substitution and/or addition of atleast one amino acid residue compared with the sequence SEQ ID No. 2 orSEQ ID No. 5 or with a fragment thereof, more preferably the variantpolypeptides having a mutation associated with the pathologicalcondition.

The present invention also relates to the cloning and/or expressionvectors comprising a nucleic acid or encoding a polypeptide according tothe invention. Such a vector may also contain the elements required forthe expression and, optionally, the secretion of the polypeptide in ahost cell. Such a host cell is also a subject of the invention.

The vectors characterized in that they comprise a promoter and/orregulator sequence according to the invention are also part of theinvention.

Said vectors preferably comprise a promoter, translation initiation andtermination signals, and also regions suitable for regulatingtranscription. It must be possible for them to be maintained stably inthe cell and they may optionally contain particular signals specifyingsecretion of the translated protein.

These various control signals are chosen as a function of the cellularhost used. To this effect, the nucleic acid sequences according to theinvention may be inserted into vectors which replicate autonomously inthe chosen host, or vectors which integrate in the chosen host.

Among the systems which replicate autonomously, use is preferably made,depending on the host cell, of systems of the plasmid or viral type, theviral vectors possibly being in particular adenoviruses (Perricaudet etal., 1992), retroviruses, lentiviruses, poxviruses or herpesviruses(Epstein et al., 1992). Those skilled in the art are aware of thetechnology which can be used for each of these systems.

When integration of the sequence into the chromosomes of the host cellis desired, use may be made, for example, of systems of the plasmid orviral type; such viruses are, for example, retroviruses (Temin, 1986),or AAVs (Carter, 1993).

Among the nonviral vectors, preference is given to naked polynucleotidessuch as naked DNA or naked RNA according to the technology developed bythe company VICAL, bacterial artificial chromosomes (BACs), yeastartificial chromosomes (YACs) for expression in yeast, mouse artificialchromosomes (MACs) for expression in murine cells and, preferably, humanartificial chromosomes (HACs) for expression in human cells.

Such vectors are prepared according to the methods commonly used bythose skilled in the art, and the clones resulting therefrom can beintroduced into a suitable host using standard methods, such as, forexample, lipofection, electroporation, heat shock, transformation afterchemical permeabilization of the membrane, or cell fusion.

The invention also comprises host cells, in particular the eukaryoticand prokaryotic cells, transformed with the vectors according to theinvention, and also the transgenic animals, preferably the mammals,except humans, comprising one of said transformed cells according to theinvention. These animals may be used as models, for studying theetiology of inflammatory and/or immune diseases, in particular of theinflammatory diseases of the digestive tract, or for studying cancers.

Among the cells which can be used for the purpose of the presentinvention, mention may be made of bacterial cells (Olins and Lee, 1993),but also yeast cells (Buckholz, 1993) as well as animal cells, inparticular mammalian cell cultures (Edwards and Aruffo, 1993), andespecially Chinese hamster ovary (CHO) cells. Mention may also be madeof insect cells in which it is possible to use methods employing, forexample, baculo viruses (Luckow, 1993). A preferred cellular host forexpressing the proteins of the invention consists of COS cells.

Among the mammals according to the invention, animals such as rodents,in particular mice, rats or rabbits, expressing a polypeptide accordingto the invention are preferred.

Among the mammals according to the invention, preference is also givento animals such as mice, rats or rabbits, characterized in that the geneencoding the protein of sequence SEQ ID No. 2 or SEQ ID No. 5, or thesequence of which is encoded by the homologous gene in these animals, isnot functional, has been knocked out or has at least one mutation.

These transgenic animals are obtained, for example, by homologousrecombination on embryonic stem cells, transfer of these stem cells toembryos, selection of the chimeras affected in the reproductive lines,and growth of said chimeras.

The transgenic animals according to the invention may thus overexpressthe gene encoding the protein according to the invention, or theirhomologous gene, or express said gene into which a mutation isintroduced. These transgenic animals, in particular mice, are obtained,for example, by transfection of a copy of this gene under the control ofa promoter which is strong and ubiquitous, or selective for a tissuetype, or after viral transcription.

Alternatively, the transgenic animals according to the invention may bemade deficient for the gene encoding one of the polypeptides of sequenceSEQ ID No. 2 or SEQ ID No. 5, or their homologous genes, by inactivationusing the LOXP/CRE recombinase system (Rohlmann et al., 1996) or anyother system for inactivating the expression of this gene.

The cells and mammals according to the invention can be used in a methodfor producing a polypeptide according to the invention, as describedbelow, and may also be used as a model for analysis.

The cells or mammals transformed as described above can also be used asmodels in order to study the interactions between the polypeptidesaccording to the invention, and the chemical or protein compoundsinvolved directly or indirectly in the activities of the polypeptidesaccording to the invention, this being in order to study the variousmechanisms and interactions involved.

They may in particular be used for selecting products which interactwith the polypeptides according to the invention, in particular theprotein of sequence SEQ ID No. 2 or SEQ ID No. 5 or variants thereofaccording to the invention, as a cofactor or as an inhibitor, inparticular a competitive inhibitor, or which have an agonist orantagonist activity with respect to the activity of the polypeptidesaccording to the invention. Preferably, said transformed cells ortransgenic animals are used as a model in particular for selectingproducts for combating pathological conditions associated with abnormalexpression of this gene.

The invention also relates to the use of a cell, of a mammal or of apolypeptide according to the invention, for screening chemical orbiochemical compounds which may interact directly or indirectly with thepolypeptides according to the invention, and/or which are capable ofmodulating the expression or the activity of these polypeptides.

Similarly, the invention also relates to a method for screeningcompounds capable of interacting, in vitro or in vivo, with a nucleicacid according to the invention, using a nucleic acid, a cell or amammal according to the invention, and detecting the formation of acomplex between the candidate compounds and the nucleic acid accordingto the invention.

The compounds thus selected are also subjects of the invention.

The invention also relates to the use of a nucleic acid sequenceaccording to the invention, for synthesizing recombinant polypeptides.

The method for producing a polypeptide of the invention in recombinantform, which is itself included in the present invention, ischaracterized in that the transformed cells, in particular the cells ormammals of the present invention, are cultured under conditions whichallow the expression of a recombinant polypeptide encoded by a nucleicacid sequence according to the invention, and in that said recombinantpolypeptide is recovered.

The recombinant polypeptides, characterized in that they can be obtainedusing said method of production, are also part of the invention.

The recombinant polypeptides obtained as indicated above can be in bothglycosylated and nonglycosylated form, and may or may not have thenatural tertiary structure.

The sequences of the recombinant polypeptides may also be modified inorder to improve their solubility, in particular in aqueous solvents.

Such modifications are known to those skilled in the art, such as, forexample, deletion of hydrophobic domains or substitution of hydrophobicamino acids with hydrophilic amino acids.

These polypeptides may be produced using the nucleic acid sequencesdefined above, according to the techniques for producing recombinantpolypeptides known to those skilled in the art. In this case, thenucleic acid sequence used is placed under the control of signals whichallow its expression in a cellular host.

An effective system for producing a recombinant polypeptide requireshaving a vector and a host cell according to the invention.

These cells can be obtained by introducing into host cells a nucleotidesequence inserted into a vector as defined above, and then culturingsaid cells under conditions which allow the replication and/orexpression of the transfected nucleotide sequence.

The methods used for purifying a recombinant polypeptide are known tothose skilled in the art. The recombinant polypeptide may be purifiedfrom cell lysates and extracts or from the culture medium supernatant,by methods used individually or in combination, such as fractionation,chromatography methods, immunoaffinity techniques using specificmonoclonal or polyclonal antibodies, etc.

The polypeptides according to the present invention can also be obtainedby chemical synthesis using one of the many known forms of peptidesynthesis, for example techniques using solid phases (see in particularStewart et al., 1984) or techniques using partial solid phases, byfragment condensation or by conventional synthesis in solution.

The polypeptides obtained by chemical synthesis and which may comprisecorresponding unnatural amino acids are also included in the invention.

The mono- or polyclonal antibodies, or fragments thereof, chimericantibodies or immunoconjugates, characterized in that they are capableof specifically recognizing a polypeptide according to the invention,are part of the invention.

Specific polyclonal antibodies may be obtained from a serum of an animalimmunized against the polypeptides according to the invention, inparticular produced by genetic recombination or by peptide synthesis,according to the usual procedures.

The advantage of antibodies which specifically recognize certainpolypeptides, variants or immunogenic fragments thereof according to theinvention is in particular noted.

The mono- or polyclonal antibodies, or fragments thereof, chimericantibodies or immunoconjugates characterized in that they are capable ofspecifically recognizing the polypeptides of sequence SEQ ID No. 2 orSEQ ID No. 5 are particularly preferred.

The specific monoclonal antibodies may be obtained according to theconventional method of hybridoma culture described by Köhler andMilstein (1975).

The antibodies according to the invention are, for example, chimericantibodies, humanized antibodies, or Fab or F(ab′)₂ fragments. They mayalso be in the form of immunoconjugates or of labeled antibodies, inorder to obtain a detectable and/or quantifiable signal.

The invention also relates to methods for detecting and/or purifying apolypeptide according to the invention, characterized in that they usean antibody according to the invention.

The invention also comprises purified polypeptides, characterized inthat they are obtained using a method according to the invention.

Moreover, besides their use for purifying the polypeptides, theantibodies of the invention, in particular the monoclonal antibodies,may also be used for detecting these polypeptides in a biologicalsample.

They thus constitute a means for the immunocytochemical orimmunohistochemical analysis of the expression of the polypeptidesaccording to the invention, in particular the polypeptides of sequenceSEQ ID No. 2 or SEQ ID No. 5, or a variant thereof, on specific tissuesections, for example using immunofluorescence, gold labeling and/orenzymatic immunoconjugates.

They may in particular make it possible to demonstrate abnormalexpression of these polypeptides in the biological specimens or tissues.

More generally, the antibodies of the invention may advantageously beused in any situation where the expression of a polypeptide according tothe invention, normal or mutated, must be observed.

Thus, a method for detecting a polypeptide according to the invention,in a biological sample, comprising the Steps of bringing the biologicalsample into contact with an antibody according to the invention anddemonstrating the antigen-antibody complex formed, is also a subject ofthe invention, as is a kit for carrying out such a method. Such a kit inparticular contains:

-   -   a) a monoclonal or polyclonal antibody according to the        invention;    -   b) optionally, reagents for constituting a medium suitable for        the immunoreaction;    -   c) the reagents for detecting the antigen-antibody complex        produced during the immunoreaction.

The antibodies according to the invention may also be used in thetreatment of an inflammatory and/or immune disease, or of a cancer, inhumans, when abnormal expression of the IBD1 gene or of the IBD1proxgene is observed. Abnormal expression means overexpression or theexpression of a mutated protein.

These antibodies may be obtained directly from human serum, or may beobtained from animals immunized with polypeptides according to theinvention, and then “humanized”, and may be used as such or in thepreparation of a medicinal product intended for the treatment of theabovementioned diseases.

The methods for determining an allelic variability, a mutation, adeletion, a loss of heterozygocity or any genetic abnormability of thegene according to the invention, characterized in that they use anucleic acid sequence, a polypeptide or an antibody according to theinvention, are also part of the invention.

The invention in fact provides the sequence of the IBD1 and IBD1proxgenes involved in inflammatory and/or immune diseases, and in particularIBDs. One of the teachings of the invention is to specify the mutations,in these nucleic acid or polypeptide sequences, which are associatedwith a phenotype corresponding to one of these inflammatory and/orimmune diseases.

These mutations can be detected directly by analysis of the nucleic acidand of the sequences according to the invention (genomic DNA, RNA orcDNA), but also via the polypeptides according to the invention. Inparticular, the use of an antibody according to the invention whichrecognizes an epitope bearing a mutation makes it possible todistinguish between a “healthy” protein and a protein “associated with apathological condition”.

Thus, the study of the IBD1 gene in various inflammatory and/or immunehuman diseases thus shows that sequence variants of this gene exist inCrohn's disease, ulcerative colitis and Blau's syndrome, as demonstratedby the examples. These sequence variations result in considerablevariations in the deduced protein sequence. In fact, they are eitherlocated on very conserved sites of the protein in important functionaldomains, or they result in the synthesis of a truncated protein. It istherefore extremely probable that these deleterious modifications leadto a modification of the function of the protein and therefore have acausal effect in the occurrence of these diseases.

The variety of diseases in which these mutations are observed suggeststhat the IBD1 gene is potentially important in many inflammatory and/orimmune diseases. This result should be compared with the fact that thepericentromeric region of chromosome 16 has been described as containinggenes for susceptibility to various human diseases, such as ankylosingspondylarthritis or psoriatic arthropathy. It may therefore beconsidered that IBD1 has an important role in a large number ofinflammatory and/or immune diseases.

In particular, IBD1 can be associated with granulomatous inflammatorydiseases. Blau's syndrome and CD are in fact diseases which are part ofthis family. It is therefore hoped that variations in the IBD1 gene willbe found for the other diseases of the same family (sarcoidosis,Behcet's disease, etc.).

In addition, the involvement of IBD1 in the cellular pathways leading toapoptosis raises the question of its possible carcinogenic role. Infact, it is expected that a dysregulation of IBD1 may result in apredisposition to cancer. This hypothesis is supported by the fact thata predisposition to colon cancer exists in inflammatory bowel diseases.IBD1 may in part explain this susceptibility to cancer and define newcarcinogenic pathways.

The precise description of the mutations which can be observed in theIBD1 gene thus makes it possible to lay down the foundations of amolecular diagnosis for the inflammatory or immune diseases in whichthis role is demonstrated. Such an approach, based on searching formutations in the gene, will make it possible to contribute to thediagnosis of these diseases and possibly to reduce the extent of certainadditional examinations which are invasive or expensive. The inventionlays down the foundations of such a molecular diagnosis based onsearching for mutations in IBD1.

The molecular diagnosis of inflammatory diseases should also make itpossible to improve the nosological classification of these diseases andto more clearly define subgroups of particular diseases by theirclinical characteristics, the progressive nature of the disease or theresponse to certain treatments. By way of example, the dismantling ofthe existing mutations may thus make it possible to classify thecurrently undetermined forms of colitis which represent more than 10% ofinflammatory bowel diseases. Such an approach will make it possible topropose an early treatment suitable for each patient. In general, suchan approach makes it possible to hope that it will eventually bepossible to define an individualized treatment for the disease,depending on the genetic area of each disease, including curative andpreventive measures.

In particular, preference is given to a method of diagnosis and/or ofprognostic assessment of an inflammatory disease or of a cancer,characterized in that the presence of at least one mutation and/or adeleterious modification of expression of the gene corresponding to SEQID No. 1 or SEQ ID No. 4 is determined, using a biological specimen froma patient, by analyzing all or part of a nucleic acid sequencecorresponding to said gene. The genes SEQ ID No. 3 or SEQ ID No. 6 mayalso be studied.

This method of diagnosis and/or of prognostic assessment may be usedpreventively (a study of predisposition to inflammatory diseases or tocancer), or in order to serve in establishing and/or confirming aclinical condition in a patient.

Preferably, the inflammatory disease is an inflammatory disease of thedigestive tract, and the cancer is a cancer of the digestive tract(small intestine or colon).

The teaching of the invention in fact makes it possible to determine themutations which exhibit a linkage disequilibrium with inflammatorydiseases of the digestive tract, and which are therefore associated withsuch diseases.

The analysis may be carried out by sequencing all or part of the gene,or by other methods known to those skilled in the art. Methods based onPCR, for example PCR-SSCP, which makes it possible to detect pointmutations, may in particular be used.

The analysis may also be carried out by attaching a probe according tothe invention, corresponding to one of the sequences SEQ ID No. 1, 3, 4or 6, to a DNA chip, and hybridization on these microplates. A DNA chipcontaining a sequence according to the invention is also one of thesubjects of the invention.

Similarly, a protein chip containing an amino acid sequence according tothe invention is also a subject of the invention. Such a protein chipmakes it possible to study the interactions between the polypeptidesaccording to the invention and other proteins or chemical compounds, andmay thus be useful for screening compounds which interact with thepolypeptides according to the invention. The protein chips according tothe invention may also be used to detect the presence of antibodiesdirected against the polypeptides according to the invention in theserum of patients. A protein chip containing an antibody according tothe invention may also be used.

Those skilled in the art are also able to carry out techniques forstudying the deleterious modification of the expression of a gene, forexample by studying the mRNA (in particular by Northern blotting or withRT-PCR experiments, with probes or primers according to the invention),or the protein expressed, in particular by Western blotting, usingantibodies according to the invention.

The gene tested is preferably the gene of sequence SEQ ID No. 1, theinflammatory disease for which the intention is to predictsusceptibility being a disease of the digestive tract, in particularCrohn's disease or ulcerative colitis. If the intention is to detect acancer, it is preferably colon cancer.

The invention also relates to methods for obtaining an allele of theIBD1 gene, associated with a detectable phenotype, comprising thefollowing steps:

-   -   a) obtaining a nucleic acid sample from an individual expressing        said detectable phenotype;    -   b) bringing said nucleic acid sample into contact with an agent        capable of specifically detecting a nucleic acid encoding the        IBD1 protein;    -   c) isolating said nucleic acid encoding the IBD1 protein.

Such a method may be followed by a step of sequencing all or part of thenucleic acid encoding the IBD1 protein, which makes it possible topredict susceptibility to inflammatory disease or of a cancer.

The agent capable of specifically detecting a nucleic acid encoding theIBD1 protein is advantageously an oligonucleotide probe according to theinvention, which may be made up of DNA, RNA or PNA, which may or may notbe modified. The modifications may include radioactive or fluorescentlabeling, or may be due to modifications in the bonds between the bases(phosphorothioates or methyl phosphonates, for example). Those skilledin the art are aware of the protocols for isolating a specific DNAsequence. Step b) of the method described above may also be anamplification step as described above.

The invention also relates to a method for detecting and/or assaying anucleic acid according to the invention, in a biological sample,comprising the following steps of bringing a probe according to theinvention into contact with a biological sample, and detecting and/orassaying the hybrid formed between said polynucleotide and the nucleicacid of the biological sample.

Those skilled in the art are capable of carrying out such a method, andmay in particular use a kit of reagents, comprising:

-   -   a) a polynucleotide according to the invention, used as a probe;    -   b) the reagents required for carrying out a hybridization        reaction between said probe and the nucleic acid of the        biological sample;    -   c) the reagents required for detecting and/or assaying the        hybrid formed between said probe and the nucleic acid of the        biological sample;        which is also a subject of the invention.

Such a kit may also contain positive or negative controls in order toensure the quality of the results obtained.

However, in order to detect and/or assay a nucleic acid according to theinvention, those skilled in the art may also perform an amplificationstep using primers chosen from the sequences according to the invention.

Finally, the invention also relates to the compounds chosen from anucleic acid, a polypeptide, a vector, a cell or an antibody accordingto the invention, or the compounds obtained using the screening methodsaccording to the invention, as a medicinal product, in particular forpreventing and/or treating an inflammatory and/or immune disease, or acancer, associated with the presence of at least one mutation of thegene corresponding to SEQ ID No. 1 or SEQ ID No. 4, preferably aninflammatory disease of the digestive tract, in particular Crohn'sdisease or ulcerative colitis.

The following examples make it possible to understand more clearly theadvantages of the invention, and should not be considered to limit thescope of the invention.

DESCRIPTION OF THE FIGURES

FIG. 1: Nonparametric genetic linkage tests for Crohn's disease in thepericentromeric region of chromosome 16 (according to Hugot et al.,1996). Multipoint linkage analysis based on identity by decendance forthe markers of the pericentromeric region of chromosome 16. The geneticdistances between markers were estimated using the CRIMAP program. Thelod score (MAPMAKER/SIBS) is indicated on the left-hand figure. Twopseudoprobability tests were developed and reported on the right-handfigure. The first (Tz) is analogous to the test of the means. The second(Tz2) is analogous to the test of the proportion of affected pairssharing two alleles.

FIG. 2: Multipoint nonparametric genetic linkage analysis. 78 familieswith several relatives suffering from Crohn's disease were genotyped for26 polymorphism markers in the pericentromeric region of chromosome 16.The location of each marker is symbolized by an arrow. The order of themarkers and the distance separating them derive from the analysis of theexperimental data with the Crimap software. The arrows under the curveindicate the markers SPN, D16S409 and D16S411 used in the first studypublished (Hugot et al., 1996). The arrows located at the top of thefigure correspond to the markers D16S3136, D16S541, D16S3117, D16S416and D16S770 located at the maximum of the genetic linkage test. Thetyping data were analyzed using the multipoint nonparametric analysisprogram of the Genehunter software version 1.3. The maximum NPL score is3.33 (p=0.0004).

FIG. 3: Diagrammatic representation of the protein encoded by IBD1. Theprotein encoded by IBD1 is represented horizontally. The various domainsof which it is composed are indicated on the figure with the amino acidreference number corresponding to the start and to the end of eachdomain. The protein consists of a CARD domain, a nucleotide-bindingdomain (NBD) and leucine-rich motifs (LRR).

FIG. 4: Diagrammatic representation of the IBD1/NOD2 protein in threevariants associated with CD.

A: The translation produced deduced from the cDNA sequence of the IBD1candidate gene is identical to that of NOD2 (Ogura et al., 2000). Thepolypeptide contains 2 CARD domains (CAspase Recruitment Domains), anucleotide-binding domain (NBD) and 10 repeats of 27 amino acids,leucine-rich motifs (LRR). The consensus sequence of the ATP/GTP-bindingsite of the motif A (P loop) of the NBD is indicated with a blackcircle. The sequence changes encoded by the three main variantsassociated with CD are SNP 8 (R675W), SNP 12 (G881R) and SNP 13 (frameshift 980). The frame shift changes a leucine codon to a proline codonat position 980, which is immediately followed by a stop codon.

B: Rare missense variants of NOD2 in 457 CD patients, 159 UC patientsand 103 unaffected, unrelated individuals. The positions of the raremissense variants are indicated for the three groups. The scale on theleft indicates the number of each variant identified in the groups underinvestigation and that on the right measures the frequency of themutation. The allelic frequencies of the polymorphism V928I was notsignificantly different (0.92:0.08) in the three groups and thecorresponding genotypes were in Hardy-Weinberg equilibrium.

EXAMPLES Example 1 Fine Location of IBD1

The first step toward identifying the IBD1 gene was to reduce the sizeof the genetic region of interest, initially centered on the markerD16S411 located between D16S409 and D16S419 (Hugot et al., 1996 and FIG.1). A group of close markers (high resolution genetic map) was used inorder to more clearly specify the genetic region, and made it possibleto complete the genetic linkage analyses and to search for a geneticlinkage disequilibrium with the disease.

The study related to 78 families comprising at least 2 relativessuffering from CD, which corresponded to 119 affected pairs. Thefamilies comprising sick individuals suffering from UC were excludedfrom the study.

Twenty-six genetic polymorphism markers of the micro-satellite type werestudied. These markers together made up a high resolution map with anaverage distance between markers of the order of 1cM in the geneticregion of interest. The characteristics of the markers studied are givenin table 1.

TABLE 1 Polymorphic markers of the microsatellite type used for the finelocation of IBD1 Name of polymorphism Cumulative marker distance (cM)PCR primers D16S3120 0 SEQ ID No. 7 (AFM326vc5) SEQ ID No. 8 D16S298 2.9SEQ ID No. 9 (AFMa189wg5) SEQ ID No. 10 D16S299 3.4 SEQ ID No. 11 SEQ IDNo. 12 SPN 3.9 SEQ ID No. 13 SEQ ID No. 14 D16S383 4.3 SEQ ID No. 15 SEQID No. 16 D16S753 4.9 SEQ ID No. 17 (GGAA3G05) SEQ ID No. 18 D16S30445.8 SEQ ID No. 19 (AFMa222za9) SEQ ID No. 20 D16S409 5.8 SEQ ID No. 21(AFM161xa1) SEQ ID No. 22 D16S3105 6.1 SEQ ID No. 23 (AFMb341zc5) SEQ IDNo. 24 D16S261 6.8 SEQ ID No. 25 (MFD24) SEQ ID No. 26 D16S540 6.9 SEQID No. 27 (GATA7B02) SEQ ID No. 28 D16S3080 7 SEQ ID No. 29 (AFMb068zb9)SEQ ID No. 30 D16S517 7 SEQ ID No. 31 (AFMa132we9) SEQ ID No. 32 D16S4118 SEQ ID No. 33 (AFM186xa3) SEQ ID No. 34 D16S3035 10.4 SEQ ID No. 35(AFMa189wg5) SEQ ID No. 36 D16S3136 10.4 SEQ ID No. 37 (AFMa061xe5) SEQID No. 38 D16S541 11.4 SEQ ID No. 39 (GATA7E02) SEQ ID No. 40 D16S311711.5 SEQ ID No. 41 (AFM288wb1) SEQ ID No. 42 D16S416 12.4 SEQ ID No. 43(AFM210yg3) SEQ ID No. 44 D16S770 13.2 SEQ ID No. 45 (GGAA20G02) SEQ IDNo. 46 D16S2623 15 SEQ ID No. 47 (GATA81B12) SEQ ID No. 48 D16S390 16.5SEQ ID No. 49 SEQ ID No. 50 D16S419 20.4 SEQ ID No. 51 (AFM225zf2) SEQID No. 52 D16S771 21.8 SEQ ID No. 53 (GGAA23C09) SEQ ID No. 54 D16S40825.6 SEQ ID No. 55 (AFM137xf8) SEQ ID No. 56 D16S508 38.4 SEQ ID No. 57(AFM304xf1) SEQ ID No. 58

Each marker is listed according to international nomenclature and mostlyby the name proposed by the laboratory of origin. The markers appearaccording to their order on the chromosome (from 16p to 16q). Thegenetic distance between the markers (in Kosambi centiMorgans,calculated from the experimental data using the Crimap program) isindicated in the second column. The first polymorphic marker is takenrandomly as a reference point. The oligonucleotides which were used forthe polymerase chain reaction (PCR) are indicated in the third column.

The genotyping of these microsatellite markers was based on automaticsequencer technology using fluorescent primers. Briefly, afteramplification, the fluorescent polymerase chain reaction (PCR) productswere loaded onto a polyacrylamide gel on an automatic sequenceraccording to the manufacturer's recommendations (Perkin Elmer). The sizeof the alleles for each individual was deduced using the Genescan® andGenotyper® software. The data were then kept on an integrated computerbase containing the genealogical, phenotypic and genetic data. They werethen used for the genetic linkage analyses.

Several quality controls were carried out throughout the genotypingprocedure:

-   -   independent double reading of the genotyping data,    -   use of a standard DNA as an internal control for each        electrophoretic migration,    -   control of the size range for each allele observed,    -   search for mendelian transmission errors,    -   calculation of the genetic distance between markers (CRIMAP        program) and comparison of this distance with the data from the        literature,    -   further typing of the markers for which recombination between        close markers was observed.

The genotyping data were analyzed by multipoint non-parametric geneticlinkage methods (GENEHUNTER program version 1.3). The informativeness ofthe marker system was greater than 80% for the region studied. The testmaximum (NPL=3.33; P=0.0004) was obtained for the markers D16S541,D16S3117, D16S770 and D16S416 (FIG. 2).

The typing data for these 26 polymorphism markers were also analyzed soas to search for a transmission disequilibrium. Two groups of 108 and 76families with one or more sick individuals suffering from CD werestudied. The statistical test for transmission disequilibrium has beendescribed by Spielman et al. (1993). In this study, only one sickindividual per family was taken into account, and the value of p wascorrected by the number of alleles tested for each marker studied.

A transmission disequilibrium was observed for alleles 4 and 5 (size 205and 207 base pairs, respectively) of the marker D16S3136 (p=0.05 andp=0.01, respectively).

These results, which suggest an association between the marker D16S3136and CD, led to the construction of a physical map of the genetic regioncentered on D16S3136 and to establishment of the sequence of a largegenomic DNA segment (BAC) containing this polymorphic site. It was thenpossible to identify and analyze a larger number of polymorphism markersin the region of D16S3136, and also to define and study the transcribedsequences present in the region.

Example 2 Physical Mapping of the IBD1 Region

A contig of genomic DNA fragments, centered on the markers D16S3136,D16S3117, D16S770 and D16S416, was generated from the human genomic DNAlibraries of the Jean Dausset foundation/CEPH. The chromosomal DNAsegments were identified based on certain polymorphism markers used infine genetic mapping (D16S411, D16S416, D16S541, D16S770, D16S2623,D16S3035, D16S3117 and D16S3136). For each marker, a bacterialartificial chromosome (BAC) library was screened by PCR so as to searchfor clones containing the marker sequence. Depending on whether or notthe sequences tested were present on the BAC clones, it was thenpossible to organize the clones among one another using the Segmapsoftware version 3.35.

It was possible to establish, for the BACs, a continuous organization(contig) covering the genetic region of interest, according to a methodknown to those skilled in the art (Rouquier et al., 1994; Kim et al.,1996; Asakawa et at., 1997). To do this, the ends of the BACs identifiedwere sequenced and these new sequence data were then used to repeatedlyscreen the BAC libraries. At each screening, the BAC contig thenprogressed by a step until a continuum of overlapping clones wasobtained. The size of each BAC contributing to the contig was deducedfrom its migration profile on a pulsed field agarose gel.

A BAC contig containing 101 BACs and extending over an overall distanceof more than 2.5 Mb, with an average redundancy of 5.5 BACs at eachpoint of the contig, was thus constructed. The average size of the BACsis 136 kb.

Example 3 Sequencing of BAC hb87b10

The BAC of this contig containing the polymorphism marker D16S3136(called hb87b10), the size of which was 163761 bp, was sequencedaccording to the “shotgun” method. Briefly, the BAC DNA was fragmentedby sonication. The DNA fragments thus generated were subjected toagarose gel electrophoresis and those with a size greater than 1.5 kbwere eluted in order to be analyzed. These fragments were then clonedinto the m13 phage, which was itself introduced into bacteria madecompetent, by electroporation. After culturing, the DNA of the cloneswas recovered and sequenced by automatic sequencing methods usingfluorescent primers of the m13 vector on an automatic sequencer.

1526 different sequences with an average size of 600 bp were generated,which were organized with respect to one another using thePolyphredphrap® software, resulting in a sequence contig covering theentire BAC. The sequence thus generated had an average redundancy of 5.5genomic equivalents. The rare (n=5) sequence gaps not represented in them13 clone library were filled by generating specific PCR primers, oneither side of these gaps, and analyzing the PCR product derived fromthe genomic DNA of a healthy individual.

Sequence homologies with sequences available in public genetic databases(Genbank) were sought. No known gene could be identified in this regionof 163 kb. Several ESTs were positioned, suggesting that unknown geneswere contained in this sequence. These ESTs derived from the publicgenetic databases (Genbank, GDB, Unigene, dbEST) bore the followingreferences: AI167910, A1011720, Rn24957, Mm30219, hs132289, AA236306,hs87296, AA055131, hs151708, AA417809, AA417810, hs61309, hs116424,HUMGS01037, AA835524, hs105242, SHGC17274, hs146128, hs122983, hs87280and hs135201. The search for putative exons using the GRAIL computerprogram made it possible to identify several potential exons,polyadenylation sites and promoter sequences.

Example 4 Transmission Disequilibrium Studies

12 biallelic polymorphism markers (SNPs) were identified in a regionextending over approximately 250 kb and centered on the BAC hb87b10.These polymorphisms were generated by analyzing the sequence of ten orso independent sick individuals suffering from CD. The sequencing wasmostly carried out at known ESTs positioned on the BAC or in the regionthereof. Putative exons, predicted by the GRAIL computer program, werealso analyzed. The characteristics of the polymorphic markers thusidentified are given in table 2.

TABLE 2 Characteristics of biallelic polymorphism markers studied in theregion of IBD1 I II III IV V VI 1 KIAA0849ex9 AS-PCR SEQ ID No. 88 to 90116 2 hb27G11F PCR- BsrI SEQ ID No. 86, 87 185 RFLP 116 69 3 Ctg22Ex1PCR- RsaI SEQ ID No. 84, 85 381 RFLP 313 69 4 SNP1 AS-PCR SEQ ID No. 81to 83 410 5 ctg2931- LO SEQ ID No. 78 to 80 51 3ac/ola 49 6 ctg2931- LOSEQ ID No. 75 to 77 44 5ag/ola 42 7 SNP3-2931 AS-PCR SEQ ID No. 72 to 74245 8 Ctg25Ex1 PCR- BsteII SEQ ID No. 70, 71 207 RFLP 122 85 9 CTG35ExAAS-PCR SEQ ID No. 67 to 69 333 10 ctg35ExC AS-PCR SEQ ID No. 64 to 66198 11 D16S3136 SEQ ID No. 37, 38 12 hb133D1f PCR- TaqI SEQ ID No. 62,63 369 RFLP 295 74 13 D16S3035 SEQ ID No. 35, 36 14 ADCY7int7 AS-PCR SEQID No. 59 to 61 140 AS-PCR: allele-specific PCR; LO: ligation ofoligonucleotides

The 12 biallelic polymorphism markers newly described in this study arelisted in this table. For each one of them, the following are indicated:

-   -   the locus (column I)    -   the name (column II)    -   the genotyping technique used (column III)    -   the restriction enzyme possibly used (column IV)    -   the oligonucleotide primers used for the polymerase chain        reaction or for the ligation (column V)    -   the size of the products expected during typing        (column VI)

199 families comprising 1 or more sick individuals suffering from CDwere typed for these 12 polymorphism markers and also for the markersD16S3035 and D16S3136 located on the BAC hb87b10. The familiescomprising sick individuals suffering from UC were not taken intoaccount. The methods for typing the polymorphisms studied were variabledepending on the type of polymorphism, using:

-   -   the PCR-RFLP technique (amplification followed by enzymatic        digestion of the PCR product) when the polymorphism was located        on an enzymatic restriction site.    -   PCR with primers specific for the polymorphic site: differential        amplification of two alleles using primers specific for each        allele.    -   Oligoligation test: differential ligation using oligonucleotides        specific for each allele, followed by polyacrylamide gel        electrophoresis.

The typing data were then analyzed using a transmission disequilibriumtest (TDT computer program of the GENEHUNTER software version 2). Forthe families comprising several affected relatives, a single suffererwas taken into account for the analysis. In fact, if several relatedsufferers are taken into account, this poses the problem ofnonindependence of the data in the statistical calculations and caninduce an inflation of the value of the test. The sufferer used for theanalysis was drawn by lots, within each family, using an automaticrandomization procedure. Given this randomization, the value of thestatistical test obtained represented only one possible sample derivedfrom the group of families studied. So as not to limit the analysis tothis one possible sample, and in order to understand more clearly thesoundness of the results obtained, for each test, about one hundredrandom samples were thus generated and analyzed.

The markers were studied separately and then grouped according to theirorder on the chromosomal segment (KIAA0849ex9 (locus 1), hb27G11F (locus2), Ctg22Ex1 (locus 3), SNP1 (locus 4), ctg2931-3ac/ola (locus 5),ctg2931-5ag/ola (locus 6), SNP3-2931 (locus 7), Ctg25Ex1 (locus 8),CTG35ExA (locus 9), ctg35ExC (locus 10), d16s3136 (locus 11), hb133D1f(locus 12), D16S3035 (locus 13), ADCY7int7 (locus 14)) (table 2). Thehaplotypes comprising 2, 3 and 4 consecutive markers were thus analyzedstill using the same strategy (100 random samples, taking a singleaffected individual for each family).

For each sample tested, only the genotypes (or haplotypes) carried by atleast 10 parental chromosomes were taken into account. On average, 250different tests were thus carried out for each sample. It was thenpossible to deduce the number of tests expected to be positive for eachsignificance threshold and to compare this distribution to thedistribution observed. For the healthy individuals, the distribution ofthe tests is not different from that expected on a random basis(χ²=2.85, ddl=4, p=0.58). For the sick individuals, on the other hand,there is an excess of positive tests, reflecting the existence of atransmission disequilibrium in the region studied.

The results of the transmission disequilibrium test for eachpolymorphism marker taken separately or for the haplotypes showing thestrongest transmission disequilibriums showed that the following markersand the disease are in linkage disequilibrium: Ctg22Ex1 (locus 3), SNP1(locus 4), ctg2931-5ag/ola (locus 6), SNP3-2931 (locus 7), Ctg25Ex1(locus 8) and ctg35ExC (locus 10). These markers extend over a region ofapproximately 50 kb (positions 74736 to 124285 on the sequence ofhb87b10).

The haplotypes the most strongly associated with Crohn's diseasethemselves also extend over this region. Thus, for the majority of therandom samples, the transmission test was positive (p<0.01) forhaplotypes combining the following markers:

-   -   locus 5-6, locus 6-7, locus 7-8, locus 8-9, locus 9-10, locus        10-11    -   locus 5-6-7, locus 6-7-8, locus 7-8-9, locus 8-9-10, locus        9-10-11    -   locus 5-6-7-8, locus 6-7-8-9, locus 7-8-9-10.

The susceptibility haplotype most at risk is defined by the loci 7 to10. This is the haplotype 1-2-1-2 (table 2).

The markers tested are, as expected, in linkage disequilibrium withrespect to one another.

More recently, a new test, the Pedigree Disequilibrium Test (PDT),published in July 2000 (Martin et al., 2000), was used to understandmore clearly the meaning of the results obtained with the TDT computerprogram. This new statistic in fact makes it possible to use all of theinformation available in a family, both from the sick individuals andfrom the healthy individuals, and to counterbalance the importance ofeach relative in an overall statistic for each family. The values of pcorresponding to the PDT tests and obtained for an enlarged group of 235families with one or more relatives suffering from Crohn's disease aregiven in table 3. This new analysis confirms that the region of the BAChb87b10 is indeed associated with Crohn's disease.

TABLE 3 Results of the PDT tests carried out on 235 families sufferingfrom Crohn's disease LOCUS VALUE p OF THE PDT TEST KIAA0849ex9 NShb27g11f 0.05 ctg22ex1 0.01 SNP1 0.001 ctg2931-3ac/ola NSctg2931-5ag/ola 0.0001 SNP3-2931 0.0001 ctg25ex1 0.0006 ctg35exA NSctg35exC 0.00002 D16S3136 NS hb133d1f NS D16S3035 NS (NS: notsignificant)

Example 5 Identification of the IBD1 Gene

The published EST groups (Unigene references: Hs135201, Hs87280,Hs122983, Hs146128, H5105242, Hs116424, Hs61309, Hs151708, Hs87296 andHs132289) present on the BAC hb87b10 were studied in the search for amore complete complementary DNA (cDNA) sequence. For IBD1prox, theclones available in public libraries were sequenced and the sequenceswere organized with respect to one another. For IBD1, a peripheral bloodcomplementary DNA library (Stratagene human blood cDNA lambda zapexpressref 938202) was screened with the PCR products generated from known ESTsaccording to the methods proposed by the manufacturer. The sequence ofthe cDNAs thus identified was then used for further screening of thecDNA library, and so on, until the presented cDNA was obtained.

The EST hs135201 (UniGene) made it possible to identify a cDNA notappearing on the available genetic databases (Genbank). It thereforecorresponds to a new human gene. Comparison of the sequence of the cDNAand of the genomic DNA showed that this gene consists of 11 exons and 10introns. An additional exon, positioned 5′ to the cDNA identified, ispredicted by analysis of the sequence with the Grail program. Theseexons are very homologous to the first exons of the CARD4/NOD1 gene.Taking into consideration all of the exons identified and the putativeadditional exon, this new gene appears to have a genomic structure veryclose to that of CARD4/NOD1. Moreover, a transcription initiation siteappears upstream of the first putative exon. For all of these reasons,the putative exon was considered to contribute to this new gene. ThecDNA reproduced in the annex (SEQ ID No. 1) therefore comprises all ofthe identified sequence plus the sequence predicted by the computermodeling, the complementary DNA beginning randomly at the first ATGcodon of the predicted coding sequence. On this basis, the gene wouldtherefore comprise 12 exons and 11 introns. The intron-exon structure ofthe gene is reported on SEQ ID No. 3.

The protein sequence deduced from the nucleotide sequence comprises 1041amino acids (SEQ ID No. 2). This sequence has not been found on thebiological databases either (Genpept, pir, swissprot).

Now, more recently, it has not been possible to confirm the putativeexon described above. The IBD1 gene therefore effectively comprises only11 exons and 10 introns and encodes a protein of 1013 amino acids (i.e.28 amino acids less than initially determined).

The study of the deduced protein sequence shows that this gene containsthree different functional domains (FIG. 3):

-   -   A CARD domain (Caspase Recruitment Domain) known to be involved        in the interaction between proteins regulating apoptosis and        activation of the NFkappa B pathway. The CARD domain makes it        possible to classify this new protein in the CARD protein        family, the most longstanding members of which are CED4, APAF1        and RICK.    -   An NBD domain (Nucleotide-Binding Domain) comprising an        ATP-recognition site and a magnesium-binding site. The protein        should therefore very probably have kinase activity.    -   An LRR domain (Leucine-Rich Domain) presumed to participate in        the interaction between proteins, by analogy with other        described protein domains.

Moreover, the LRR domain of the protein makes it possible to affiliatethe protein to a family of proteins involved in intracellular signalingand present both in plants and in animals.

Comparison of this new gene with previously identified genes availablein the public databases shows that this gene is very homologous toCARD4/NOD1 (Bertin et al., 1999; Inohara et al., 1999). This homologyrelates to the sequence of the complementary DNA, the intron-exonstructure of the gene and the protein sequence. The sequence identity ofthe two complementary DNAs is 58%. A similarity is also observed at thelevel of the intron-exon structure. The sequence homology at the proteinlevel is of the order of 40%.

The similarity between this new gene and CARD4/NOD1 suggests that, likeCARD4/NOD1, the IBD1 protein is involved in the regulation of apoptosisand of the activation of NF-kappa B (Bertin et al., 1999; Inohara etal., 1999). The regulation of cellular apoptosis and activation ofNF-kappa B are intracellular signaling pathways which are essential inimmune reactions. Specifically, these signal translation pathways arethe effector pathways of the proteins of the TNF (Tumor Necrosis Factor)receptor family involved in cell-cell interactions and the cellularresponse to the various mediators of inflammation (cytokines). The newgene therefore appears to be potentially important in the inflammatoryreaction in general.

Several bodies of proof support bacteria-induced deregulation of NF-kBin Crohn's disease. First of all, spontaneous susceptibility to IBD inmice has been associated with mutations in Tlr4, a molecule known tobind to LPS via its LRR domain (Poltorak et al., 1998 and Sundberg etal., 1994) and to be a member of the activators of the NF-kB family.Second, treatment with antibiotics causes a provisional improvement inpatients suffering from CD, giving credit to the hypothesis that entericbacteria may play an etiological role in Crohn's disease (McKay, 1999).Third, NF-kB plays a pivotal role in inflammatory bowel diseases and isactivated in lamina propria mononuclear cells in Crohn's disease(Schreiber et al., 1998). Fourth, the treatment of Crohn's disease isbased on the use of sulfasalazine and glucocorticoids, which are bothknown to be NF-kB inhibitors (Auphan et al., 1995 and Wahl et al.,1998).

Even more recently, it has been shown that the IBD1 candidate geneencodes a protein very similar to NOD2, a member of the CED4/APAF1superfamily (Ogura et al., 2000). The nucleotide and protein sequencesof IBD1 and NOD2 in reality only diverge for a small portion right atthe start of the two reported sequences. The tissue expressions of Nod2and IBD1 can, in addition, be superimposed. These two genes (proteins)can therefore be considered to be identical. It has been demonstratedthat the LRR domain of Nod2 has binding activity for bacteriallipopolysaccharides (LPS) (Inohara et al., 2000) and that deletionthereof stimulates the NFkB pathway. This result confirms the data ofthe invention.

The tissue expression of IBD1 was then studied by Northern blotting. A4.5 kb transcript is visible in most human tissues. The size of thetranscript is in accordance with the size predicted by the cDNA. The 4.5kb transcript appears to be very poorly abundant in the small intestineand the colon. It is, on the other hand, very strongly expressed inwhite blood cells. This is in agreement with clinical data ontransplants which suggest that Crohn's disease is potentially a diseaseassociated with circulating immune cells. In fact, bowel transplantationdoes not prevent recurrence on the transplant in Crohn's disease,whereas bone marrow transplantation appears to have a beneficial effecton the progression of the disease.

Certain data also call to mind alternative splicing, which may prove tobe an important element in the possibility of generating mutants whichmay play a role in the development of inflammatory diseases.

The promoter of the IBD1 gene has not currently been identified withprecision. It is, however, reasonable to think, by analogy with a verylarge number of genes, that this promoter lies, at least partly,immediately upstream of the gene, in the 5′ portion thereof. Thisgenetic region contains transcribed sequences, as witnessed by thepresence of ESTs (HUMGS01037, AA835524, hs.105242, SHGC17274, hs.146128,hs.122983, hs.87280). The ATCC clones containing these sequences weresequenced and analyzed in the laboratory, making it possible todemonstrate an exon and intron organization with possible alternativesplicings. These data suggest the existence of another gene (namedIBD1prox due to its proximity to IBD1). The partial sequence of thecomplementary DNA of IBD1prox is reported (SEQ ID No. 4), as is itsintron-exon structure, on SEQ ID No. 6.

Translation of the cDNAs corresponding to IBD1prox results in a proteincontaining a homeobox. Analysis of several cDNAs of the gene suggests,however, the existence of alternative splicings. IBD1prox, according toone of the possible alternative splicings, corresponds to the anonymousEST HUMGS01037, the RNA of which is expressed more strongly indifferentiated leukocytic lines than in undifferentiated lines.

Thus, it is possible that this gene may have a role in inflammation andcell differentiation. It may therefore also, itself, be considered to bea good candidate for susceptibility to IBD. The association between CDand the polymorphism ctg35ExC located on the coding sequence of IBD1proxsupports this hypothesis even though this polymorphism does not causeany sequence variation at the protein level.

Finally, more recently, the existence of a genetic linkage in familiessuffering from Crohn's disease and not comprising any mutation in theIBD1 gene also, itself, suggests that IBD1prox has a role in addition toIBD1 in genetic predisposition to the disease.

The functional relationship between IBD1 and IBD1prox is not currentlyestablished. However, the considerable proximity between the two genesmay reflect an interaction between them. In this case, the“head-to-tail” location of these genes suggests that they may havecommon or interdependent methods of regulation.

Example 6 Identification of IBD1 Gene Mutations in Inflammatory Diseases

In order to confirm the role of IBD1 in inflammatory diseases, thecoding sequence and the intron-exon junctions of the gene were sequencedfrom exon 2 to exon 12 inclusive, in 70 independent individuals, namely:50 sick individuals suffering from CD, 10 sick individuals sufferingfrom UC, 1 sick individual suffering from Blau's syndrome and 9 healthycontrols. The sick individuals studied were mostly familial forms of thedisease and were often carriers of the susceptibility haplotype definedby the transmission disequilibrium studies. The healthy controls were ofCaucasian origin.

It was thus possible to identify 24 sequence variants on this group of70 unrelated individuals (table 3).

The nomenclature of the mutations reported refers to the initialsequence of the protein comprising 1 041 amino acids. The more recentlyproposed nomenclature is easily deduced by removing 28 amino acids fromthe initial sequence, and therefore corresponds to a protein comprising1 013 amino acids (cf. example 5).

TABLE 4 Mutations observed in the IBD1 gene Nucleotide Protein Crohn'sUlcerative Health Exon variant variant disease colitis controls 1 Nottested 2 G417A Silent 2 C537G Silent 3 None 4 T805C S269P 48/100  6/203/18 4 A869G N290S 0 0 1/18 4 C905T A302V 1/100 0 0 4 C1283T P428L 1/1000 0 4 C1284A Silent 4 C1287T Silent 4 T1380C Silent 4 T1764G Silent 4G1837A A613T 1/100 0 0 4 C2107T R703W 10/10  1/20 1/18 4 C2110T R704C4/10  1/20 0 5 G2365A R792Q 1/100 0 0 5 G2370A V794M 0 1/20 0 5 G2530AE844K 1/10  0 0 6 A2558G N853S 1/100 0 0 6 A2590G M864V 1/100 0 0 7 None8 G2725C G909R 7/100 0 0 8 C2756A A919D 1/100 0 0 9 G2866A V9561 2/1001/20 3/18 10 C2928T Silent 11 3022insC Stop 20/100  0 0 12 none

The mutations other than silent mutations observed in each exon arereported. They are indicated by the variation in the peptide chain. Foreach mutation and for each phenotype studied, the number of times wherethe mutation is observed, related to the number of chromosomes tested,is indicated.

No functional sequence variant was identified in exons 1 to 3(corresponding to the CARD domain of the protein). Exons 7 and 12 didnot show any sequence variation either. Certain variants corresponded topolymorphisms already identified and typed for transmissiondisequilibrium studies, namely:

-   -   Snp3-2931: nucleotide variant T805C, protein variant S269P    -   ctg2931-5ag/ola: nucleotide variant T1380C (silent)    -   ctg2931-3ac/ola: nucleotide variant T1746G (silent)    -   SNP1: nucleotide variant C2107T, protein variant R703W.

Several sequence variations were silent (G417A, C537G, C1284A, C1287T,T1380C, T1764G and C2928T) and did not lead to any modification of theprotein sequence. They were not studied further here.

For the 16 non-silent sequence variations, protein sequence variantswere observed in 43/50 CD versus 5/9 healthy controls, and 6/10 UC. Theexistence of one or more sequence variation(s) appeared to be associatedwith the CD phenotype. Several sequence variations often existed in thesame individual suffering from CD, suggesting a sometimes recessiveeffect of the gene for CD. On the other hand, no composite heterozygoteor homozygote was observed among the patients suffering from UC or amongthe healthy controls.

Some non-silent variants were present both in the sick individualssuffering from UC or from CD and in the healthy individuals. They werethe variants S269P, N290S, R703W and V956I located in exons 2, 4 and 9.Further information therefore appears to be necessary before selecting apossible functional role for these sequence variants.

V956I is a conservative sequence variation (aliphatic amino acids).

The sequence variant S269P corresponds to a variation in amino acidclass (hydroxylated to immuno acid) at the beginning of thenucleotide-binding domain. This sequence variant and CD are intransmission disequilibrium. It is in fact the polymorphism Snp3 (cf.above).

R703W results in a modification of the amino acid class (aromaticinstead of basic). This modification occurs in the intermediate regionbetween the NBD and LRR domains, which is a region conserved betweenIBD1 and CARD4/NOD1. A functional role may therefore be suspected forthis polymorphism. This sequence variation (corresponding to thepolymorphic site Snpl) is transmitted to sick individuals suffering fromCD more often than at random (cf. above), confirming that thispolymorphism is associated with CD. It is possible that the presence ofthis mutant in healthy individuals reflects incomplete penetration ofthe mutation as is expected for complex genetic diseases such as chronicinflammatory bowel diseases.

The variant R704C, located immediately next to R703W, could beidentified in both CD and UC. It also, itself, corresponds to anonconservative variation of the protein (sulfur-containing amino acidinstead of basic amino acid) on the same protein region, suggesting afunctional effect for R704C which is as important as that for R703W.

Other sequence variations are specific for CD, for UC or for Blau'ssyndrome.

Some sequence variations are, on the contrary, rare, present in one or afew sick individuals (A613T, R704C, E844K, N853S, M864V, A919D). Theyare always variations leading to nonconservative modifications of theprotein in leucine-rich domains, at positions which are important withinthese domains. These various elements suggest that these variations havea functional role.

Two sequence variations (G909R and L1008P*) are found in quite a largenumber of Crohn's diseases (respectively 7/50 and 16/50) whereas theyare not detected in the controls or in the individuals suffering fromUC.

The deletion/insertion of a guanosine at codon 1008 results intransformation of the third leucine of the alpha helix of the last LRRto proline followed by a STOP codon (L1008P*). This sequence variationtherefore leads to an important modification of the protein: decrease insize of the protein (protein having a truncated LRR domain) andmodification of a very conserved amino acid (leucine). This sequencemodification is associated with CD, as witnessed by a transmissiondisequilibrium study in 16 families carrying the mutation (P=0.008).

The mutation G909R occurs on the last amino acid of the sixth LRR motif.It replaces an aliphatic amino acid with a basic amino acid. Thisvariation is potentially important given the usually neutral or polarnature of the amino acids in the terminal position of the leucine-richmotifs (both for IBD1 and for NOD1/CARD4) and the conserved nature ofthis amino acid on the IBD1 and NOD1/CARD4 proteins.

In Blau's syndrome, the sick individuals (n=2) of the family studiedcarried a specific sequence variation (L470F) located in exon 4 andcorresponding to the NBD domain of the protein. In this series, thissequence variant was specific for Blau's syndrome.

In UC, several sequence variants not found in healthy individuals werealso identified. The proportion of sick individuals carrying a mutationwas smaller than for CD, as expected given the less strongly establishedlinkage between IBD1 and UC, and the supposedly less genetic nature ofthe latter disease. Sequence variations were common to CD and to UC(R703W, R704C). Others, on the other hand, appeared to be specific forUC (V794M). This observation makes it possible to confirm that CD and UCare diseases which, at least partly, share the same geneticpredisposition. It lays down the foundations of a nosologicalclassification for IBDs.

The study of the sequence variants of the IBD1 gene has therefore madeit possible to identify several variants having a very probablefunctional effect (for example: truncated protein) and associated withCrohn's disease, with UC and with Blau's syndrome.

The promoter of the gene is not currently determined. In allprobability, however, it is likely to be located in the 5′ regionupstream of the gene. According to this hypothesis, the sequencevariants observed in this region may have a functional effect. This mayexplain the very strong association between CD and certain polymorphicloci, such as ctg35ExC or Ctg25Ex1.

The invention thus provides the first description of mutations in thefamily of genes containing a CARD domain in humans. The frequency ofthese mutations in various inflammatory diseases shows that the IBD1gene has an essential role in normal and pathological inflammatoryprocesses. This invention provides new paths of understanding and ofresearch in the field of the physiopathology of normal and pathologicalinflammatory processes. As a result, it makes it possible to envisionthe development of new pharmaceutical molecules which regulate theeffector pathways controlled by IBD1 and which are useful in thetreatment of inflammatory diseases and in the regulation of inflammatoryprocesses in general.

Example 7 Bases for a Biological Diagnosis of Susceptibility to Crohn'sDisease

More recently, 457 independent patients suffering from Crohn's disease,159 independent patients suffering from ulcerative colitis and 103healthy controls were studied in the search for mutations. This studymade it possible to confirm the mutations previously reported and toidentify additional mutations, reported in FIG. 4. The main mutationswere then genotyped in 235 families suffering from Crohn's disease. Thismore recent study is reported using, as reference, the shorter proteinsequence (1 013 amino acids, see example 5), but the prior nomenclaturefor the mutations is easily deduced from the latter by adding 28 to thenumber indicating the position of the amino acids.

Among the 5 most common mutations, the conservative mutation V928I(formerly V956I) is not significantly associated with one or the otherof the inflammatory bowel diseases, and does not therefore appear tohave an important role in the disease.

The mutation S241P (formerly S269P) is in linkage disequilibrium withthe other main mutations and does not appear to play an important role,by itself, in susceptibility to inflammatory bowel diseases (data notshown).

Conversely, the other 3 mutations, R675W (formerly R703W), G881R(formerly G909R) and 980fs (formerly L1008P*), are significantlyassociated with Crohn's disease but not with ulcerative colitis (cf.below). The location in the LRR, or in its immediate proximity, of the 3common mutations pleads very strongly in favor of a functional mechanisminvolving this protein domain, probably via a defect in negativeregulation of NFkB by the mutated protein. The other mutations are morerare (FIG. 4). These cumulative mutations are present in 17% of theindividuals suffering from Crohn's disease versus, respectively, 4% and5% of the healthy individuals or individuals suffering from ulcerativecolitis. A large number of rare mutations are also located in the LRR.

The intrafamily studies of the three polymorphisms most common inCrohn's disease show that all three are associated with the disease(table 5). As expected, for a mutation supposed to be very deleterious,the polymorphism most strongly associated is the truncating mutation.These three polymorphisms are independently associated with Crohn'sdisease, since it was not possible to identify, on 235 families,chromosomes carrying more than one of these three mutations. Theindependent nature of these associations considerably supports thehypothesis that the IBD1 gene is clearly involved in geneticpredisposition to Crohn's disease.

TABLE 5 Study of the 3 common polymorphisms of IBD1 in 235 familiessuffering from Crohn's disease MUTATION VALUE p OF THE PDT TEST R675W0.001 G881R 0.003 980fs 0.000006

The case-control studies confirm this association (table 6). They showthat the mutations most common in Crohn's disease are not common inulcerative colitis.

TABLE 6 Case-control study of the 3 common polymorphisms of IBD1 ininflammatory bowel diseases No. OF FREQUENCY OF FREQUENCY OF FREQUENCYOF TOTAL CHROMOSOMES THE ALLELE AT THE ALLELE AT THE ALLELE AT ALLELESMUTATION STUDIED RISK R675W RISK G881R RISK 980fs AT RISK Healthy 2060.04 0.01 0.02 0.07 controls Ulcerative 318 0.03 0.00 0.01 0.05 colitisCrohn's 936 0.11 0.06 0.12 0.29 disease

The study of the dose-effect of these mutations shows that individualscarrying a mutation in the homozygous or composite heterozygous stateexhibit a much greater risk of developing the disease than individualswho are not carrying or are heterozygous for these mutations (table 7).

TABLE 7 Relative and absolute risk of Crohn's disease attributable as afunction of the genotype of IBD1 In the general population, a risk ofCrohn's disease of 0.001 has been taken as a reference, and it has beenpresumed that the mutations are in Hardy-Weinberg equilibrium.Distribution GENOTYPE SIMPLE COMPOSITE No HETERO- HOMO- HETERO- VARIANTZYGOTE ZYGOTE ZYGOTE Healthy 88 15 0 0 Ulcerative 145 13 1 0 colitisCrohn's 267 133 28 40 disease Attributable risk of CD: Relative risk 1 338 44 Absolute risk 0.0007 0.002 0.03 0.03

The studies mentioned above confirm the prior preliminary data andprovide the detailed bases for a biological diagnosis of Crohn's diseaseby studying the IBD1 variants. In fact, this work:

-   1) defines the mutations, the frequency of which is greater than    0.001 in a mixed Caucasian population;-   2) defines the frequency of the mutations observed and makes it    possible to define 3 main mutations associated with Crohn's disease.    Thus, it is possible, by virtue of this work, to define a strategy    for studying the gene in order to search for morbid variants,    namely: firstly, typing the 3 main mutations; secondly, searching    for mutations in the last 7 exons; thirdly, searching for other    sequence variants;-   3) defines the practical modalities for searching for these    mutations by pointing out their position and their nature. In fact,    it is then easy for those skilled in the art to develop typing and    sequencing methods according to their personal expertise. Mention    may in particular be made of the possibility of genotyping the three    main mutations by PCR followed by enzymatic digestion and    electrophoresis, study of the migration profiles by dHPLC, DGGE or    SSCP, oligoligation, microsequencing, etc.;-   4) demonstrates the independence of the most common mutations which    are not observed on the same chromosome in this extended and varied    population. This information makes it possible to reliably classify    the individuals who are composite heterozygotes (having two    mutations) as carriers with a double dose of intragenic variations;-   5) demonstrates that the great majority of the mutations only lead    to a null or minimal effect on the risk of ulcerative colitis. This    result makes it possible to envision assisting the clinician in the    differential diagnosis between these two diseases. In fact, in    approximately 10% of cases, inflammatory bowel diseases remain    unclassified despite biological, radiological and endoscopic    examination;-   6) defines a relative and absolute risk of disease for the most    common genotypes. This result lays down the foundations of a    predictive diagnosis potentially useful in an approach of preventive    monitoring and intervention in populations at risk, in particular    the relatives of sick individuals;-   7) demonstrates the existence of a dose-effect for the IBD1 gene and    confirms the partly recessive nature of genetic predisposition to    Crohn's disease. It therefore makes it possible to lay the    foundations for genetic counseling and for intra-familial    preclinical diagnosis.

Finally, it should be noted that an additional mutation of the NBDdomain was isolated in a second family carrying Blau's syndrome. Therareness of the two events in 2 different families is sufficient toconfirm the involvement of this gene in Blau's syndrome and ingranulomatous diseases in general.

All of these data provide a diagnostic tool which is directly applicableand of use to the practitioner in his or her daily practice.

The IBD1prox gene, located in the promoter region of IBD1, and thepartial sequence of which is disclosed in the present invention, mayalso, itself, have an important role in the regulation of cellularapoptosis and of the inflammatory process, as suggested by itsdifferential expression in mature cells of the immune system. The strongassociation reported in this work between the polymorphism markerctg35ExC (located in the transcribed region of the gene) and Crohn'sdisease also pleads very strongly in favor of this hypothesis.

Inflammatory bowel diseases are complex genetic diseases for which,until now, no susceptibility gene had been identified with certainty.The invention has made it possible to identify the first gene forsusceptibility to Crohn's disease, using a positional cloning (orreverse genetics) approach. This is the first genetic location obtainedusing such an approach for a complex genetic disease, which demonstratesits usefulness and its feasibility, at least in certain cases in complexgenetic diseases.

The present invention also relates to a purified or isolated nucleicacid, characterized in that it encodes a polypeptide possessing acontinuous fragment of at least 200 amino acids of a protein chosen fromSEQ ID No. 2 and SEQ ID No. 5.

REFERENCES

-   Auphan et al. (1995), Science 270, 286-90.-   Asakawa et al. (1997), Gene, 191, 69.-   Becker et al. (1998), Proc. Natl. Acad. Sci. USA, 95, 9979.-   Bertin et al. (1999), J. Biol. Chem., 274, 12955.-   Buckholz (1993), Curr. Op. Biotechnology 4, 538.-   Carter, (1993), Curr. Op. Biotechnology 3, 533.-   Cho et al. (1998), Proc. Natl. Acad. Sci. USA, 95, 7502.-   Duck et al. (1990), Biotechniques, 9, 142.-   Edwards and Aruffo (1993), Curr. Op. Biotechnology, 4, 558.-   Epstein (1992), Medecine/Sciences, 8, 902.-   Guatelli et al. (1990), Proc. Natl. Acad. Sci. USA 87: 1874.-   Hugot et al. (1996), Nature, 379, 821.-   Inohara et al. (1999), J. Biol. Chem., 274, 14560.-   Inohara et al. (2000), J. Biol. Chem.-   Kievitis et al. (1991), J. Virol. Methods, 35, 273.-   Kim et al. (1996), Genomics, 34, 213.-   Köhler and Milstein (1975), Nature, 256, 495.-   Kwoh et al. (1989), Proc. Natl. Acad. Sci. USA, 86, 1173.-   Landegren et al. (1988), Science 241, 1077.-   Lander and Kruglyak (1995), Nat. Genet., 11, 241.-   Luckow (1993), Curr. Op. Biotechnology 4, 564.-   Martin et al. (2000), Am. J. Hum. Genet. 67: 146-54.-   Matthews et al. (1988), Anal. Biochem., 169, 1-25.-   McKay (1999), Gastroenterol. 13, 509-516.-   Miele et al. (1983), J. Mol. Biol., 171, 281.-   Neddleman and Wunsch (1970), J. Mol. Biol. 48: 443.-   Ogura et al. (2000), J. Biol. Chem.-   Olins and Lee (1993), Curr. Op. Biotechnology 4: 520.-   Perricaudet et al. (1992), La Recherche 23: 471.-   Pearson and Lipman (1988), Proc. Natl. Acad. Sci. USA 85: 2444.-   Poltorak et al. (1998), Sciences 282, 2085-8.-   Rioux et al. (1998), Gastroenterology, 115: 1062.-   Rohlmann et al. (1996), Nature Biotech. 14: 1562.-   Rolfs, A. et al. (1991), Berlin: Springer-Verlag.-   Rouquier et al. (1994), Anal. Biochem. 217, 205.-   Sambrook et al. (1989), Molecular cloning: a laboratory manual, 2nd    ed. Cold Spring Harbor Lab., Cold Spring Harbor, N.Y..-   Satsangi et al. (1996), Nat. Genet., 14: 199.-   Schreiber et al. (1998), Gut 42, 477-84.-   Segev (1992), Kessler C. Springer Verlag, Berlin, N.Y., 197-205.-   Smith and Waterman (1981) Ad. App. Math. 2: 482.-   Steward and Yound (1984), Solid phase peptides synthesis, Pierce    Chem. Company, Rockford, 111, 2nd ed. (1984).-   Spielman et al. (1993), Am. J. Hum. Genet., 52, 506.-   Sundberg et al. (1994), Gastroenterology, 107, 1726-35.-   Temin (1986), Retrovirus vectors for gene transfer. In Kucherlapati    R., ed. Gene Transfer, New York, Plenum Press, 149-187.-   Tromp et al. (1996), Am. J. Hum. Genet., 59: 1097.-   Wahl et al. (1998), B. J. Olin. Invest. 101, 1163-74.-   Walker (1992), Nucleic Acids Res. 20: 1691.

1-25. (canceled)
 26. A purified or isolated nucleic acid probe, whereinsaid nucleic acid probe is specific for a variant nucleic acid sequenceselected from the group consisting of SEQ ID NO:3 having a C to Tmutation at nucleotide 16467, SEQ ID NO:3 having a G to C mutation atnucleotide 27059, and SEQ ID NO:3 having a C insertion at nucleotide34296.
 27. The nucleic acid probe of claim 26, wherein said nucleic acidprobe is specific for SEQ ID NO:3 having a C to T mutation at nucleotide16467.
 28. The nucleic acid probe of claim 26, wherein said nucleic acidprobe is specific for SEQ ID NO:3 having a G to C mutation at nucleotide27059.
 29. The nucleic acid probe of claim 26, wherein said nucleic acidprobe is specific for SEQ ID NO:3 having a C insertion at nucleotide34296.
 30. The nucleic acid probe of claim 26, wherein said nucleic acidprobe is about 15 to 30 nucleotides in length.
 31. The nucleic acidprobe of claim 26, wherein said nucleic acid probe comprises adetectable label.
 32. The nucleic acid probe of claim 31, wherein saiddetectable label is selected from the group consisting of a radioactiveisotope, a ligand, and a luminescent agent.
 33. The nucleic acid probeof claim 32, wherein said ligand is selected from the group consistingof biotin, avidin, streptavidin, dioxygenin, a hapten, and a dye. 34.The nucleic acid probe of claim 32, wherein said luminescent agent isselected from the group consisting of a radioluminescent agent, achemiluminescent agent, a bioluminescent agent, a fluorescent agent, anda phosphorescent agent.
 35. The nucleic acid probe of claim 26, whereinsaid nucleic acid probe is immobilized on a support.
 36. A kitcomprising the nucleic acid probe of claim
 26. 37. The kit of claim 36,wherein said kit comprises nucleic acid probes specific for at least twoof said variant nucleic acid sequences.
 38. The kit of claim 36, whereinsaid kit comprises nucleic acid probes specific for three of saidvariant nucleic acid sequences.
 39. A DNA chip comprising the nucleicacid probe of claim
 26. 40. The DNA chip of claim 39, wherein said DNAchip comprises nucleic acid probes specific for at least two of saidvariant nucleic acid sequences.
 41. The DNA chip of claim 39, whereinsaid DNA chip comprises nucleic acid probes specific for three of saidvariant nucleic acid sequences.